1 Introduction

Bar plots are one the most common chart type out there and come in several varieties. In the previous lesson, we learned how to make bar plots and their circular counterparts with {ggplot2}.

This lesson will pivot from group comparisons to the practice of labeling in data visualization. Labels provide additional context, clarify data points, and enhance the overall readability of a plot. We’ll delve into the intricacies of labeling in ggplot2, focusing on geom_label() and geom_text() functions from {ggplot2}.

2 Learning Objectives

After this lesson, you will be able to:

  1. Use two different text geoms to label ggplots:
    • geom_text() for simple labels
    • geom_label() for emphasized labels
  2. Appropriately transform and summarize data in the appropriate format for different chart types.
  3. Adjust text placement to position labels on stacked, Dodged, and percent-stacked bar plots.
  4. Adjust text placement to position labels on pie charts and donut plots.

3 Packages

We’ll utilize a combination of packages in this lesson to enhance our data visualizations:

  1. tidyverse: A collection of R packages for efficient data manipulation and visualization, including ggplot2.

  2. glue: Enables flexible string interpolation for dynamic text in plots.

  3. here: For project-relative file paths.

pacman::p_load(tidyverse, glue, here, ggthemr, patchwork)

4 Introduction to text geoms in {ggplot2}

We’ll start with geom_text() for simple labeling and then move to geom_label() for labels with more emphasis. We will show how to use these geoms on simple bar plots, then we will get into more details on how to leverage them for stacked bars, Dodged bars, normalized stacked bars, and circular plots.

First let’s practice using these functions on a simple bar plot made with fake data. Once we cover the fundamentals of the labeling syntax, we will apply these to real epidemiology data.

# Create example data frame
data <- data.frame(
  category = c("A", "B", "C"),
  count = c(10, 20, 15)
)

# Create the bar plot
ggplot(data, aes(x = category, y = count)) +
  geom_col(fill = "steelblue")

We can easily add labels to our bars with the geom_text() function and telling the aes() function which column to extract label text from:

ggplot(data, aes(x = category, y = count)) +
  geom_col(fill = "steelblue") +
  geom_text(aes(label = count)) # provide variable to `label` argument

As you can see, it is pretty easy to improve your ggplot with a few lines of code.

As you can see however, the placement of our text is odd – neither on the bar, nor under the bar. Additionally, they are quite small and difficult to make out. We can address this by making them bigger, and vertically adjusting their placement.

To do this, we will nudge the text upwards using the y_nudge argument. We will also increase the size of the text using the size argument.

ggplot(data, aes(x = category, y = count)) +
  geom_col(fill = "steelblue") +
  geom_text(aes(label = count), 
            nudge_y = 1) # move text up

Note that the value of nudge_y is in the same units as the y-axis.

Let’s try nudging the text down by setting nudge_y to a negative value:

ggplot(data, aes(x = category, y = count)) +
  geom_col(fill = "steelblue") +
  geom_text(aes(label = count), 
            nudge_y = -3) # move text down

If we made a horizontal bar plot, we would need to nudge the text to the right or left using the nudge_x argument:

ggplot(data, aes(x = count, y = category)) +
  geom_col(fill = "steelblue") +
  geom_text(aes(label = count), 
            nudge_x = 1) # move text to the right

Now let’s see how the geom_label() function works. We can use the same code as above, but replace geom_text() with geom_label():

ggplot(data, aes(x = category, y = count)) +
  geom_col(fill = "steelblue") +
  geom_label(aes(label = count), 
            nudge_y = -3)

As you can see, geom_label() draws a rectangle behind the text, making it easier to read.

In this code, the fill aesthetic in geom_label() can be adjusted to control the background fill color of the labels. For example, let’s make the background dark blue, and the text white:

ggplot(data, aes(x = category, y = count)) +
  geom_col(fill = "steelblue") +
  geom_label(aes(label = count), 
            nudge_y = -3,
            fill = "royalblue4",
            color = "white")

Two distinct text geoms

Text geoms are useful for labeling plots. They can be used in combination with other geoms, such as geom_col(), to annotate the height of bars.

  • geom_text() adds only text to the plot

  • geom_label() draws a rectangle behind the text, making it easier to read

Setting a global {ggplot2} theme So far, we’ve added a theme function to each of our bar plots. We can use the theme_set() function to set a global theme for the rest of our plots, so that we don’t have to add it each time.

Let’s define a custom theme that is a combination of theme_light and large bold axis labels:

theme_light_custom <- 
  theme_light() +
  theme(
    axis.title = element_text(size = 16, face = "bold")
  )

Now we can set this theme as the default for all plots:

theme_set(theme_light_custom)

Now theme_light_custom() will be automatically applied to every plot you draw.

For example, let’s redraw the plot we made earlier:

ggplot(data, aes(x = category, y = count)) +
  geom_col(fill = "steelblue") +
  geom_label(aes(label = count), 
            nudge_y = -3,
            fill = "royalblue4",
            color = "white")

This is a great way to ensure that all of your plots have a consistent look and feel.

5 The vjust and hjust arguments

Rather than use nudge_x and nudge_y, to adjust the position of text, we can use the vjust and hjust arguments. These arguments adjust the vertical and horizontal justification of the text, respectively. It is notoriously difficult to understand exactly how these work, but we will introduce their basic functionality here.

5.1 Understanding hjust (horizontal justification)

The hjust argument in ggplot2 adjusts the horizontal position of text labels relative to their anchor points (the actual data points). hjust values range from 0 to 1, where:

  • hjust = 0 aligns the text label’s left edge with the anchor point.
  • hjust = 0.5 centers the text label on the anchor point.
  • hjust = 1 aligns the text label’s right edge with the anchor point.

Here’s a simple example to illustrate this. First, let’s make a plot with a single point and text with no hjust argument:

# Example data
df <- data.frame(x = 1, y = 1)

# Base plot with a point
base_p <- ggplot(df, aes(x, y)) + geom_point() + theme_void()

base_p + geom_text(aes(label = "text"))

With no hjust argument, the text is centered on the point, which means that the default value of hjust is 0.5.

Now let’s try setting hjust to a variety of values:

p_hjust_0 <- base_p + geom_text(aes(label = "hjust=0"), hjust = 0)
p_hjust_0.25 <- base_p + geom_text(aes(label = "hjust=0.25"), hjust = 0.25)
p_hjust_0.5 <- base_p + geom_text(aes(label = "hjust=0.5"), hjust = 0.5)
p_hjust_0.75 <- base_p + geom_text(aes(label = "hjust=0.75"), hjust = 0.75)
p_hjust_1 <- base_p + geom_text(aes(label = "hjust=1"), hjust = 1)

# Combine plots with patchwork
p_hjust_0 / p_hjust_0.25 / p_hjust_0.5 / p_hjust_0.75 / p_hjust_1

As you can see, the text is aligned to the left edge of the point when hjust = 0, to the right edge of the point when hjust = 1, and moves closer to the center as hjust approaches 0.5.

While hjust was originally meant to be used between 0 and 1, you can actually use any value for hjust, above or below 0 and 1. For example, if you set hjust = -0.2, the text will be left-aligned, but with an additional 20% of the text width added to the left of the anchor point, and if you set hjust = 1.2, the text will be right-aligned, but with an additional 20% of the text width added to the right of the anchor point:

p_hjust_neg0.5 <- base_p + geom_text(aes(label = "hjust=-0.5"), hjust = -0.5)
p_hjust_neg0.2 <- base_p + geom_text(aes(label = "hjust=-0.2"), hjust = -0.2)
p_hjust_1.2 <- base_p + geom_text(aes(label = "hjust=1.2"), hjust = 1.2)
p_hjust_1.5 <- base_p + geom_text(aes(label = "hjust=1.5"), hjust = 1.5)

# Combine plots with patchwork
p_hjust_neg0.5 / p_hjust_neg0.2 / p_hjust_0 / p_hjust_0.25 / p_hjust_0.5 / p_hjust_0.75 / p_hjust_1 / p_hjust_1.2 / p_hjust_1.5

Using hjust and vjust values outside the 0-1 range can be problematic when your labels are not the same length. For example, if you have labels of different lengths, setting hjust = 1.2 will cause the longer labels to extend further to the right than the shorter labels.

For example:

# Different text labels with varying lengths
p_xx <- base_p + geom_text(aes(label = "xxx"), hjust = 1.5)
p_xxxx <- base_p + geom_text(aes(label = "xxxxxx"), hjust = 1.5)
p_xxxxxx <- base_p + geom_text(aes(label = "xxxxxxxxx"), hjust = 1.5)

# Combine plots with patchwork
p_xx / p_xxxx / p_xxxxxx

As you can see, the longer labels have more extra space added to the right of the anchor point than the shorter labels. This is because hjust is adding 50% of the text width to the right of the anchor point, so longer labels get more padding.

If this is a problem for you, you can use the nudge_x argument to adjust the position of the labels instead. There are certain times when using nudges can be problematic though, which is why hjust and vjust are still useful.

5.2 Understanding vjust (vertical justification)

Similarly, the vjust argument in ggplot2 adjusts the vertical position of text labels in relation to their anchor points. vjust values also range from 0 to 1, where:

  • vjust = 0 aligns the bottom edge of the text label with the anchor point.
  • vjust = 0.5 centers the text label vertically on the anchor point.
  • vjust = 1 aligns the top edge of the text label with the anchor point.

Here’s an example to illustrate vjust. We’ll start with the same base plot and add text with no vjust argument:

# Base plot with a point
p <- ggplot(df, aes(x, y)) + geom_point() + theme_void()

p + geom_text(aes(label = "text"))

By default, with no vjust specified, the text is vertically centered on the point, indicating the default value of vjust is 0.5.

Now, let’s experiment with different vjust values:

p_vjust_0 <- p + geom_text(aes(label = "vjust=0"), vjust = 0)
p_vjust_0.25 <- p + geom_text(aes(label = "vjust=0.25"), vjust = 0.25)
p_vjust_0.5 <- p + geom_text(aes(label = "vjust=0.5"), vjust = 0.5)
p_vjust_0.75 <- p + geom_text(aes(label = "vjust=0.75"), vjust = 0.75)
p_vjust_1 <- p + geom_text(aes(label = "vjust=1"), vjust = 1)

# Combine plots with patchwork
p_vjust_0 / p_vjust_0.25 / p_vjust_0.5 / p_vjust_0.75 / p_vjust_1

Here, vjust = 0 aligns the text to the bottom of the point, vjust = 1 aligns it to the top, and as vjust approaches 0.5, the text moves closer to the vertical center.

Like hjust, vjust can also take values outside the 0 to 1 range. For example, vjust = -0.2 would place the text slightly below the anchor point, and vjust = 1.2 would place it slightly above. Let’s see how these values affect text positioning:

p_vjust_neg0.5 <- p + geom_text(aes(label = "vjust=-0.5"), vjust = -0.5)
p_vjust_1.5 <- p + geom_text(aes(label = "vjust=1.5"), vjust = 1.5)

# Combine plots with patchwork
p_vjust_neg0.5 / p_vjust_0 / p_vjust_0.25 / p_vjust_0.5 / p_vjust_0.75 / p_vjust_1 / p_vjust_1.5

As with hjust, using vjust values beyond the typical 0 to 1 range can be useful for fine-tuning the placement of your text labels, allowing them to extend slightly above or below the anchor point.

6 Data Example: TB treatment outcomes in Benin

Let’s apply what we’ve learned to a real dataset.

The tb_outcomes dataset, which we used in the previous lesson, will serve as the foundation for our examples:

tb_outcomes <- read_csv(here::here('data/benin_tb.csv'))
tb_outcomes

We start with a simple bar plot of the number of TB cases by hospital. For this, let’s pre-calculate the total number of cases per hospital using the summarize() function:

hospital_sums <- 
  tb_outcomes %>% 
  group_by(hospital) %>% 
  summarize(cases = sum(cases))

hospital_sums
## # A tibble: 6 × 2
##   hospital         cases
##   <chr>            <dbl>
## 1 CHPP Akron         875
## 2 CS Abomey-Calavi   791
## 3 Hopital Bethesda   256
## 4 Hopital Savalou     80
## 5 Hopital St Luc     168
## 6 St Jean De Dieu    171

Now let’s use hospital_sums to visualize each hospital’s total number of case and use geom_text() to annotate the bars:

ggplot(hospital_sums, aes(x = hospital, y = cases)) +
  geom_col(fill = "steelblue") +
  geom_text(aes(label = cases), 
            vjust = -0.1)

Further Aesthetic modifications

So far we have only used some of the possible aesthetics for geom_text(). The minimum three aesthetics are x, y, and label. These must be mapped to a variable defined inside aes().

Additional aesthetics include:

  • size: the size of the text, in mm
  • angle: the angle of the text, from 0 to 360
  • alpha: the transparency of the text, from 0 to 1
  • color: the color of the text
  • family: the font family of the text, such as “sans”, “serif”, “mono”
  • fontface: the font face of the text, including “plain”, “bold”, “italic”, “bold.italic”
  • group: a grouping variable for the text
  • hjust: horizontal justification of the text, from 0 to 1
  • vjust: vertical justification of the text, from 0 to 1
  • lineheight: the line height of the text, from 0 to 1

nudge_y and nudge_x are also available, but are not formally considered aesthetics, as they cannot be mapped to a variable inside aes(), and must be set outside of it.

Here is an example plot with all of these aesthetics set. Try modifying the code to see how each aesthetic changes the plot:

ggplot(hospital_sums, aes(x = hospital, y = cases)) +
  geom_col(fill = "steelblue") +
  geom_text(aes(label = paste(cases, "\ncases")), 
            size = 5,
            angle = 0,
            alpha = 0.5,
            color = "black",
            family = "mono",
            fontface = "bold",
            hjust = 0.5,
            vjust = 1,
            nudge_y = -10,
            lineheight = 0.8) + 
  theme(axis.text.x = element_text(angle = 90))

Side-note: Enhancing Text Labels with ggtext

For those seeking more sophisticated control over text formatting in ggplot2, the {ggtext} package may come in handy. It allows the use of CSS to precisely format text elements, including options to embolden, italicize, change color and size, add superscripts/subscripts, and even embed images. Notably, you can apply multiple styles within the same text element, opening up new levels of creativity and customization.

Consider the example below, which uses {ggtext} for the plot title, subtitle and bar labels:

pacman::p_load(tidyverse, ggtext, medicaldata)

# Data and Plot
medicaldata::strep_tb %>% 
  count(gender) %>% 
  mutate(gender_label = paste0("**<span style='font-size:16pt'>", n, "</span>**", 
                               if_else(gender == "M", " men", " women"))) %>% 
  ggplot(aes(x = gender, fill = gender, y = n)) +
  geom_col() +
  scale_fill_manual(values = c("M" = "#ee6c4d", "F" = "#424874")) +
  labs(
    title = "<b><span style='color:#424874; font-size:19pt'>Female</span> vs
    <span style='color:#ee6c4d; font-size:19pt'>Male</span> 
    Patients in Strep Study</b>",
    subtitle = "<span style='color:gray60'>A demonstration of custom text labels with </span>**{ggtext}**") +
  theme_classic() +
  theme(plot.title = element_textbox_simple(), 
        plot.subtitle = element_textbox_simple(),
        legend.position = "none", 
        axis.text.x = element_blank()) +
  geom_richtext(aes(label = gender_label, y = n/2), 
                label.r = grid::unit(5, "pt"), fill = "white")

While the HTML and CSS involved might seem daunting at first, remember that resources like ChatGPT (and the web more broadly) can help you navigate these.

To learn more about {ggtext}, visit the package website.

7 Labeling stacked bar plots

So far, we’ve only looked at bar plots with a single categorical variable. Let’s build plots with two categorical variables and add labels to each subgroup. We’ll start with stacked bar plots.

We summarize the tb_outcomes dataset by period_date and diagnosis_type, calculating the sum of cases (cases) for each group.

# Summarize the data by period and diagnosis type
tb_sum <- tb_outcomes %>% 
  group_by(period_date, diagnosis_type) %>% 
  summarise(cases = sum(cases))

tb_sum

Now, let’s create a simple stacked bar plot and see how to add labels to it:

# Create a basic bar plot using the summarized data
quarter_dx_bar <- tb_sum %>% 
  ggplot(aes(x = period_date, y = cases, fill = diagnosis_type)) +
  geom_col() + 
  labs(title = "New and relapse TB cases per quarter",
       subtitle = "Data from six health facilities in Benin, 2015-2017")

quarter_dx_bar

We’ll use the cases column for labeling each bar:

# Add text labels to the bar plot
quarter_dx_bar +
  geom_text(aes(label = cases))

Oops, the labels are not in the right place! They don’t align with the height of the bars in our plot.

The issue is that geom_text() does not stack positions by default like geom_col(). We must explicitly set position = "stack" in geom_text():

# Place text at the top of each bar segment
quarter_dx_bar +
  geom_text(aes(label = cases),
            position = "stack") # Set position to stack

Great!

To vertically align the text inside the bars, we can add vjust to geom_text():

# Reposition labels inside the stacks for clarity and change the font style
quarter_dx_bar +
  geom_text(aes(label = cases),
    position = position_stack(),
    vjust = 1.5)

This works well, the labels are now inside the bars, and setting vjust = 1.5 adds an extra 50% of label height as padding between the label and the top of the bar.

But what if we want to center the labels vertically within each bar segment? To do this, we switch from position = "stack" to the more customizable position_stack() function, and set vjust = 0.5 within position_stack():

quarter_dx_bar +
  geom_text(aes(label = cases),
            position = position_stack(vjust = 0.5))

Now the labels are vertically centered within each bar segment.

This label placement is especially nice for horizontal bar plots. Below we flip the axes of our plot using coord_flip() to create a horizontal bar plot, and add some extra aesthetic modifications to make the plot more readable:

ggthemr::ggthemr("fresh") # set theme with ggthemr

quarter_dx_bar +
  geom_text(aes(label = cases),
            position = position_stack(vjust = 0.5),
            # some extra adjustments
            color = "white",
            fontface = "bold") +
  coord_flip()

ggthemr::ggthemr_reset() # reset theme

That looks great! Let’s move on to dodged bar charts now.

8 Labeling dodged bar plots

Dodged bar charts display multiple categories side by side. Let’s explore how to group the data and properly position labels for clear interpretation.

To begin, we’ll group our dataset tb_outcomes by hospital and diagnosis_type, calculating the sum of cases (cases) for each group.

hospital_dx_cases <- tb_outcomes %>% 
  group_by(hospital, diagnosis_type) %>% 
  summarise(cases = sum(cases))

hospital_dx_cases

Next, let’s create a simple Dodged bar chart, where the height of each bar signifies the total number of cases for a specific diagnosis in each hospital. The default parameter for geom_col is stack. To create a Dodged bar chart, we’ll need to specify position = position_dodge().

# Use "dodge" instead of the default "stack"
hospital_dx_bar <- hospital_dx_cases %>% 
  ggplot(aes(x = hospital, y = cases, fill = diagnosis_type)) +
  geom_col(position = "dodge") + 
  theme(axis.text.x = element_text(angle = 90))

hospital_dx_bar

Now, we can annotate the chart with geom_text() to display the labels, just as we’ve done before.

hospital_dx_bar +
  geom_text(aes(label = cases))

Oops, that’s not quite right! The labels are vertically centered in a straight line, and they’re not aligned with the bars. Let’s take a look at how we can fix that.

Just as with our stacked bar chart in the previous section, we need to add the position adjustment to geom_text() but this time we’re going to specify position = position_dodge().

hospital_dx_bar +
  geom_text(aes(label = cases),
            position = position_dodge())

We get the same chart as before, because a width argument is required for position_dodge().

For geom_col(), the default value of width is 0.9. We’ll also use 0.9 for geom_text() to ensure the bars and labels are aligned:

hospital_dx_bar +
  geom_text(aes(label = cases),
            position = position_dodge(width = 0.9))

Great! Now all that’s left to do is shift the labels up a bit with vjust.

hospital_dx_bar +
  geom_text(aes(label = cases),
            position = position_dodge(width = 0.9),
            vjust = -0.5)

That looks great! Let’s move on to percent-stacked bar plots.

9 Labeling percent-stacked bar plots

When labeling percent-stacked bar plots, the labels should reflect the percentages of each category. This means we need to format the labels into percentages to ensure they match the segments on the chart. By the end of this section, you’ll have recreated the graph below!

Percent-Stacked Bar Plot
Percent-Stacked Bar Plot

To get started, we want to visualize the proportion of cases in each hospital belonging to each diagnostic type. So let’s calculate the total number of cases for each health facility (hospital) by diangostic type.

hosp_dx_sum <- tb_outcomes %>%
  group_by(hospital, diagnosis_type) %>%
  summarise(total_cases = sum(cases))

hosp_dx_sum

We could use this dataset to create a percent-stacked bar plot. You may remember from the last lesson that for percent stacked plots, we set the position to fill in geom_col() to normalize the y axis.

hosp_dx_sum %>%
  ggplot(aes(x = hospital, y = total_cases, fill = diagnosis_type)) +
  geom_col(position = position_fill()) +
  geom_text(aes(label = total_cases),
            position = position_fill(),
            vjust = 1.5) 

So this is a good start but we want percentages, not raw values.

In order to prepare our data, we’ll start by grouping our dataset tb_outcomes by hospital and diagnosis type. Then we’ll calculate the sum of cases for each combination and compute the proportion of bacteriologically confirmed and clinically diagnosed cases.

hosp_dx_prop <- tb_outcomes %>%
  group_by(hospital, diagnosis_type) %>%
  summarise(total_cases = sum(cases)) %>% 
  mutate(prop = total_cases / sum(total_cases))

hosp_dx_prop
## # A tibble: 12 × 4
## # Groups:   hospital [6]
##    hospital         diagnosis_type  total_cases  prop
##    <chr>            <chr>                 <dbl> <dbl>
##  1 CHPP Akron       bacteriological         695 0.794
##  2 CHPP Akron       clinical                180 0.206
##  3 CS Abomey-Calavi bacteriological         671 0.848
##  4 CS Abomey-Calavi clinical                120 0.152
##  5 Hopital Bethesda bacteriological         139 0.543
##  6 Hopital Bethesda clinical                117 0.457
##  7 Hopital Savalou  bacteriological          70 0.875
##  8 Hopital Savalou  clinical                 10 0.125
##  9 Hopital St Luc   bacteriological         149 0.887
## 10 Hopital St Luc   clinical                 19 0.113
## 11 St Jean De Dieu  bacteriological         100 0.585
## 12 St Jean De Dieu  clinical                 71 0.415

Next, let’s create a bar chart using our new dataset hosp_dx_prop with prop as our new y varibale

hosp_dx_fill <- hosp_dx_prop %>%
  ggplot(aes(x = hospital, y = prop, fill = diagnosis_type)) +
  geom_col(position = position_fill()) 

hosp_dx_fill

Now, we can use geom_text() and specify the position to the labels:

hosp_dx_fill +
  geom_text(aes(label = prop),
            position=position_fill()) 

It’s a good start, but obviously, we still have some work to do to make it look nicer!

Before adjusting our labels, let’s handle those decimals. We could reduce the number of decimals like this:

hosp_dx_fill +
  geom_text(aes(label = round(prop,2)),
            position = position_fill()) 

However, the better method is this:

hosp_dx_fill +
  geom_text(aes(label = scales::percent(prop)),
            position = position_fill()) 

The {scales} package is commonly used with {ggplot2} for customizing aesthetics, transforming axis scales, formatting labels, defining color palettes, and more.

The scales::percent(prop) function we used in the code above with geom_text() converts the proportions (values from our prop variable) into a percentage format and adds percentage signs. We can also control the number of displayed digits using the accuracy argument (see below).

Next, we can center the labels using vjust in the position_fill() function

hosp_dx_fill + 
  geom_text(aes(label = scales::percent(prop)), 
            position = position_fill(vjust = 0.5)) # center labels

It looks great, but we can do better! Using flipped coordinates in bar charts can greatly improve readability:

hosp_dx_fill +
  geom_text(aes(label = scales::percent(prop, accuracy = 1)),
            position = position_fill(vjust = 0.5)) +
  coord_flip() 

Great, now we can add some additional aesthetic tweaks:

hosp_dx_fill +
  theme_light() +
  geom_text(aes(label = scales::percent(prop, accuracy = 1)),
            position = position_fill(vjust = 0.5),
            color = "white", # Change text color
            fontface = "bold", # Make it bold
            size = 4.5) + # Change font size
  coord_flip() 

Amazing! Let’s move on to our last section where we’ll take a look at circular plots.

10 Labeling circular plots

Let’s begin by summarizing the data. We’ll calculate the total number of cases for each hospital by grouping the data based on the hospital variable and then calculating the sum of cases in each group.

total_results <- tb_outcomes %>%
  group_by(hospital) %>%
  summarise(
    total_cases = sum(cases)) 

total_results

Now that we have our new dataset, let’s start by creating a simple bar chart. You may recall from the previous lesson that a pie chart is essentially a round version of a 100% stacked bar chart.

results_stack <- ggplot(total_results,
       aes(x=4, # Set an arbitrary x value  
           y=total_cases,
           fill=hospital)) +
  geom_col()

results_stack

Now, we can create our basic pie chart. As we learned in the last lesson, to transform linear coordinates into polar coordinates, we use the coord_polar() function. The theta parameter defines which aesthetic variable should be mapped to the angular coordinate in the polar coordinate system. By specifying "y", we use the height of the bars to determine the angle of each slice in our pie chart.

outcome_pie <- results_stack +
  coord_polar(theta = "y")

outcome_pie

Great! This will serve as our base pie chart. Next, let’s create a base donut chart using xlim().

outcome_donut <- outcome_pie +
  xlim(c(0.2, 4.5))

outcome_donut

Alright, we’re ready to move on to labelling!

Let’s add labels to our pie chart using geom_text().

outcome_pie +
  geom_text(aes(label = total_cases)) 

You’ll notice that our labels stay in the middle of the slices because coord_polar() is applied to both geom_col() and geom_text(). The numbers appear in the wrong segments because we haven’t added a position adjustment to the labeling geometry yet.

Now, just as we did previously, we will use the position_stack() argument with vjust to center the labels.

outcome_pie +
  geom_text(aes(label = total_cases), 
            position = position_stack(vjust = 0.5)) # Center the labels

To move the labels along the x-axis of our pie chart (up and down the radius), we can specify a fixed value to the x aesthetic in geom_text().

outcome_pie +
  geom_text(aes(label = total_cases,
                x = 4.25), # move the text away from the center   
            position = position_stack(vjust = 0.5)) 

We can do the same with geom_label().

# Similar adjustment with geom_label()
outcome_pie +
  geom_label(aes(label = total_cases,
                 x = 4.7), # move the text away from the center
            position = position_stack(vjust = 0.5))

Notice that once we used geom_label(), the letter ‘a’ appeared on the legend. To fix this issue, you can add the show.legend = FALSE argument to the geom_label() function like this:

# Similar adjustment with geom_label()
outcome_pie +
  geom_label(aes(label = total_cases,
                 x = 4.7), # move the text away from the center
            position = position_stack(vjust = 0.5),
            show.legend=FALSE) # remove letter "a" from legend

Next, let’s move on to our basic donut chart. We’ll label it using geom_text() and directly specifying the position, centering our labels in the middle of each section of the chart, just as we did for our pie chart.

outcome_donut +
  geom_text(aes(label = total_cases), 
            position = position_stack(vjust = 0.5))

To finish, we can make some additional aesthetic adjustments. Here, we enhance the chart’s aesthetics by applying theme_void() to remove cluttered background elements, introducing a new color palette with scale_fill_viridis_d(), and adjusting the text labels using geom_text() with white and bold text for better visibility and contrast.

# Additional aesthetic modifications
outcome_donut +
  geom_text(aes(label = total_cases),
            position = position_stack(vjust = 0.5),
            color = "white",
            fontface = "bold") +
  theme_void() +
  scale_fill_viridis_d()

Congratulations, it looks great!

Wrap Up!

In this lesson, we delved into enhancing plots with labels, focusing on geom_label() and geom_text().

We started with geom_text(), demonstrating how to place readable text directly onto plots using the tb_outcomes dataset. Then we looked at geom_label() to create more prominent labels with background boxes, ideal for complex plot backgrounds.

This was followed by a discussion on using flipped coordinates in bar plots for enhanced readability and label visibility.

The lesson is a comprehensive guide to using labeling effectively in {ggplot2}, enhancing the clarity and visual appeal of data visualizations.

References

Some material in this lesson was adapted from the following sources:

appendix

This work is licensed under the Creative Commons Attribution Share Alike license. Creative Commons License